Skip to content

feat(swe): add Qwen3-30B SWE-bench async-GRPO recipe (vLLM + SGLang)#2961

Closed
Kh4L wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
Kh4L:sglang-swe2-recipe
Closed

feat(swe): add Qwen3-30B SWE-bench async-GRPO recipe (vLLM + SGLang)#2961
Kh4L wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
Kh4L:sglang-swe2-recipe

Conversation

@Kh4L

@Kh4L Kh4L commented Jun 26, 2026

Copy link
Copy Markdown

What

Adds the multi-turn SWE-bench agentic async-GRPO recipe for Qwen3-30B-A3B-Thinking (MoE, 30B total / ~3B active):

  • examples/swe_bench/grpo_qwen3_30b_async_swe.yaml — the recipe config
  • examples/swe_bench/run_grpo_repro_baseline_swe2.sh — vLLM baseline launcher (reproduces the ~8%-resolved reference run)
  • examples/swe_bench/run_grpo_swe2_scale_gen.sh — generation-scaling sweep launcher with a BACKEND=vllm|sglang switch
  • examples/swe_bench/REPRO_swe2.md — vLLM baseline reproduction guide
  • examples/swe_bench/REPRO_swe2_sglang.md — SGLang reproduction guide

Why

Provides a reproducible reference for multi-turn SWE-bench RL (baseline ~8% resolved from step 1) and a working SGLang generation path at parity with vLLM — rollout completeness, throughput, and training-grade per-token logprob parity.

Status — draft, depends on #2447

The BACKEND=sglang path needs the enhanced SGLang backend (Megatron→SGLang MoE/PP weight-refit, router, fault-tolerance) from #2447, which is not yet merged. On current main's basic SGLang backend the SGLang path will not run the 30B-MoE recipe; the vLLM path is self-contained. Kept as a draft until #2447 lands. The companion gym-proxy token-splicing contiguity fix (required for multi-turn SGLang) is in NVIDIA-NeMo/Gym#1787.

Validation

  • Port parity: SGLang multi-turn rollouts 8/8, contiguity failures 0, ~193 gen tok/s with full CUDA graph (≈ vLLM).
  • Logprob parity (teacher-forced, 27,493 tokens): median |Δ| 1.38e-3; cross-engine median ≈ the within-engine bf16/MoE noise floor (1.24e-3) — i.e. vLLM differs from SGLang no more than SGLang differs from itself. Details in REPRO_swe2_sglang.md.

Reproduced end-to-end from a clean clone.

Signed-off-by: Serge Panev <spanev@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Kh4L

Kh4L commented Jun 26, 2026

Copy link
Copy Markdown
Author

Closing: this draft inadvertently contained internal cluster/filesystem paths. Will re-open a sanitized version.

@Kh4L Kh4L closed this Jun 26, 2026
@Kh4L Kh4L deleted the sglang-swe2-recipe branch June 26, 2026 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant